Chris Pollett > Old Classes > CS267
( Print View )

Student Corner:
  [Grades Sec2]

  [Submit Sec2]

  [Class Sign Up Sec2]

  [
Lecture Notes]
  [Discussion Board]

Course Info:
  [Texts & Links]
  [Topics/Outcomes]
  [Outcomes Matrix]
  [Grading]
  [HW/Quiz Info]
  [Exam Info]
  [Regrades]
  [Honesty]
  [Additional Policies]
  [Announcements]

HW Assignments:
  [Hw1]  [Hw2]  [Hw3]
  [Hw4]  [Quizzes]

Practice Exams:
  [Mid 1]  [Mid 2]  [Final]

                           












HW#4 --- last modified February 07 2019 23:03:29..

Solution set.

Due date: Dec 5

Files to be submitted:
  Hw4.zip

Purpose: To gain experience with index compression methods, to learn about divergence from randomness, to study map reduce algorithms.

Related Course Outcomes:

The main course outcomes covered by this assignment are:

LO4 -- Give an example of how a posting list might be compressed using difference lists and gamma codes or Rice codes.

LO5 -- Demonstrate with small examples how incremental index updates can be done with log merging.

LO7 -- Know at least one Map Reduce algorithm (for example to calculate page rank).

Specification:

This homework consists of two parts: A set of book problems, which are to be done individually, should be handwritten (no typing), and must be turned in at the start of class on the day indicated; and a coding assignment. For this portion only one team member needs to submit -- just be sure to mention the names of your team mates in the documentation for your code.

The problems I want you to do are: Exercise 6.6 and 6.9 (due Nov. 14), Exercise 7.2 and 7.3 (due Nov. 21), Exercise 9.1 and 9.2 (due Nov 28), Exercise 14.6 and 14.7 (due Dec. 5).

For the coding part of your homework I want you to extend your code from Homework 3 to support the divergence from randomness scoring method (DFR). I also want you to rewrite the code for your posting lists to make use of one the posting list compression techniques we have spoken about in class (LLRUN, gamma, delta, Rice, etc). Finally, I want you to implement a log merge algorithm extension to HW3.

Once you have implemented DFR, redo your trec_eval experiments from HW3 for DFR, put your result analysis , file used, and how you conducted your experiments in experiments.txt, include with your submission all the trec files you used. For the index compression part, when you make the index I want you to write the result to disk. I want you to code an auxiliary program called index_viewer which can be run from the command line with two arguments the file name of an index that you have written together with a word to look up. This program should then pretty-print the posting list entries for that word (i.e., a string of 1's and 0's showing what the compressed posting looked like, followed by what the uncompressed data looked like). For the log merge part of the coding assignment, I want you to assume that you only have enough memory to hold 5 documents at a time in memory (this is also at merge time and you can assume documents are all roughly the same length). I want you to code your program so that it uses the disk to keep indexing according to Figure 7.7. Furthermore, I want you to sleep 5 seconds between after you merge any pair of files in this algorithm, so that I can see the actually merge process by watching the folder involved.

Include with your homework a readme.txt file containing all the team members names and ids, as well as a brief description about how to find and run things for each of the four parts of this assignment.

Point Breakdown

Book problems (1pt each graded on scale 0, 1/2 partial, 1 completely correct) 4pts
DFR extension to HW3 works as described1pt
trec_eval experiments1pt
Index compression code works (1pt), index_viewer is as described (1pt)2pts
Logarithmic merge code works as described2pts
Total10pts